Statistical Translation, Heat Kernels and Expected Distances
نویسندگان
چکیده
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.
منابع مشابه
Information Diffusion Kernels
A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with non-parametric discr...
متن کاملAccurate and Efficient Computation of Laplacian Spectral Distances and Kernels
This paper introduces the Laplacian spectral distances, as a function that resembles the usual distance map, but exhibits properties (e.g., smoothness, locality, invariance to shape transformations) that make them useful to processing and analyzing geometric data. Spectral distances are easily defined through a filtering of the Laplacian eigenpairs and reduce to the heat diffusion, wave, bi-har...
متن کاملDiffusion Kernels on Statistical Manifolds
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomia...
متن کاملExpected Sequence Similarity Maximization
This paper presents efficient algorithms for expected similarity maximization, which coincides with minimum Bayes decoding for a similarity-based loss function. Our algorithms are designed for similarity functions that are sequence kernels in a general class of positive definite symmetric kernels. We discuss both a general algorithm and a more efficient algorithm applicable in a common unambigu...
متن کاملEvolution of dispersal distance: maternal investment leads to bimodal dispersal kernels.
Since dispersal research has mainly focused on the evolutionary dynamics of dispersal rates, it remains unclear what shape evolutionarily stable dispersal kernels have. Yet, detailed knowledge about dispersal kernels, quantifying the statistical distribution of dispersal distances, is of pivotal importance for understanding biogeographic diversity, predicting species invasions, and explaining r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007